154 research outputs found
Iterative Policy-Space Expansion in Reinforcement Learning
Humans and animals solve a difficult problem much more easily when they are presented with a sequence of problems that starts simple and slowly increases in difficulty. We explore this idea in the context of reinforcement learning. Rather than providing the agent with an externally provided curriculum of progressively more difficult tasks, the agent solves a single task utilizing a decreasingly constrained policy space. The algorithm we propose first learns to categorize features into positive and negative before gradually learning a more refined policy. Experimental results in Tetris demonstrate superior learning rate of our approach when compared to existing algorithms.This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 66599
Regularization in Directable Environments with Application to Tetris
Learning from small data sets is difficult in the absence of specific domain knowledge. We present a regularized linear model called STEW that benefits from a generic and prevalent form of prior knowledge: feature directions. STEW shrinks weights toward each other, converging to an equal-weights solution in the limit of infinite regularization. We provide theoretical results on the equal-weights solution that explains how STEW can productively trade-off bias and variance. Across a wide range of learning problems, including Tetris, STEW outperformed existing linear models, including ridge regression, the Lasso, and the non-negative Lasso, when feature directions were known. The model proved to be robust to unreliable (or absent) feature directions, still outperforming alternative models under diverse conditions. Our results in Tetris were obtained by using a novel approach to learning in sequential decision environments based on multinomial logistic regression. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 66599
Iterative Policy-Space Expansion in Reinforcement Learning
Humans and animals solve a difficult problem much more easily when they are
presented with a sequence of problems that starts simple and slowly increases
in difficulty. We explore this idea in the context of reinforcement learning.
Rather than providing the agent with an externally provided curriculum of
progressively more difficult tasks, the agent solves a single task utilizing a
decreasingly constrained policy space. The algorithm we propose first learns to
categorize features into positive and negative before gradually learning a more
refined policy. Experimental results in Tetris demonstrate superior learning
rate of our approach when compared to existing algorithms.Comment: Workshop on Biological and Artificial Reinforcement Learning at the
33rd Conference on Neural Information Processing Systems (NeurIPS 2019),
Vancouver, Canad
Regularization in Directable Environments with Application to Tetris
Learning from small data sets is difficult in the absence of specific domain knowledge. We present a regularized linear model called STEW that benefits from a generic and prevalent form of prior knowledge: feature directions. STEW shrinks weights toward each other, converging to an equal-weights solution in the limit of infinite regularization. We provide theoretical results on the equal-weights solution that explains how STEW can productively trade-off bias and variance. Across a wide range of learning problems, including Tetris, STEW outperformed existing linear models, including ridge regression, the Lasso, and the non-negative Lasso, when feature directions were known. The model proved to be robust to unreliable (or absent) feature directions, still outperforming alternative models under diverse conditions. Our results in Tetris were obtained by using a novel approach to learning in sequential decision environments based on multinomial logistic regression. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 66599
Colour versus Shape Goal Misgeneralization in Reinforcement Learning: A Case Study
We explore colour versus shape goal misgeneralization originally demonstrated
by Di Langosco et al. (2022) in the Procgen Maze environment, where, given an
ambiguous choice, the agents seem to prefer generalization based on colour
rather than shape. After training over 1,000 agents in a simplified version of
the environment and evaluating them on over 10 million episodes, we conclude
that the behaviour can be attributed to the agents learning to detect the goal
object through a specific colour channel. This choice is arbitrary.
Additionally, we show how, due to underspecification, the preferences can
change when retraining the agents using exactly the same procedure except for
using a different random seed for the training run. Finally, we demonstrate the
existence of outliers in out-of-distribution behaviour based on training random
seed alone.Comment: ATTRIB: Workshop on Attributing Model Behavior at Scale at NeurIPS
202
Betweenness Centrality as a Basis for Forming Skills
We show that betweenness centrality, a graph-theoretic measure widely used in social network analysis, provides a sound basis for autonomously forming useful high-level behaviors, or skills, from available primitives— the smallest behavioral units available to an autonomous agent
Using Relative Novelty to Identify Useful Temporal Abstractions in Reinforcement Learning
We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative novelty. When such a state is identified, a temporallyextended activity (e.g., an option) is generated that takes the agent efficiently to this state. We illustrate the utility of the method in a number of tasks
Creating Multi-Level Skill Hierarchies in Reinforcement Learning
What is a useful skill hierarchy for an autonomous agent? We propose an answer based on a graphical representation of how the interaction between an agent and its environment may unfold. Our approach uses modularity maximisation as a central organising principle to expose the structure of the interaction graph at multiple levels of abstraction. The result is a collection of skills that operate at varying time scales, organised into a hierarchy, where skills that operate over longer time scales are composed of skills that operate over shorter time scales. The entire skill hierarchy is generated automatically, with no human intervention, including the skills themselves (their behaviour, when they can be called, and when they terminate) as well as the hierarchical dependency structure between them. In a wide range of environments, this approach generates skill hierarchies that are intuitively appealing and that considerably improve the learning performance of the agent
- …